Software understanding: Automatic classification of software identifiers
نویسندگان
چکیده
Identifier names (e.g., packages, classes, methods, variables) are one of most important software comprehension sources. Identifier names need to be analyzed in order to support collaborative software engineering and to reuse source codes. Indeed, they convey domain concept of softwares. For instance, “getMinimumSupport” would be associated with association rule concept in data mining softwares, while some are difficult to recognize such as the case of mixing parts of words (e.g., “initFeatSet”). We thus propose methods for assisting automatic software understanding by classifying identifier names into domain concept categories. An innovative solution based on data mining algorithms is proposed. Our approach aims to learn character patterns of identifier names. The main challenges are (1) to automatically split identifier names into relevant constituent subnames (2) to build a model associating such a set of subnames to predefined domain concepts. For this purpose, we propose a novel manner for splitting such identifiers into their constituent words and use Ngrams based text classification to predict the related domain concept. In this article, we report the theoretical method and the algorithms we propose, together with the experiments run on real software source codes that show the interest of our approach.
منابع مشابه
Classification of Habitats in Bakhtegan wetland (Fars Province) in Mediterranean Wetland System
Understanding ecosystems and their zonation based on their characteristics and sensitivities is an essential step in protecting natural resources and the sustainability of their habitats. The Mediterranean Wetland Habitat Classification System (MEDWET) is used to identify Bakhtegan wetland, Fars Province, Iran, by hierarchical classification of their habitats, in which they are identified and d...
متن کاملA CAD System Framework for the Automatic Diagnosis and Annotation of Histological and Bone Marrow Images
Due to ever increasing of medical images data in the world’s medical centers and recent developments in hardware and technology of medical imaging, necessity of medical data software analysis is needed. Equipping medical science with intelligent tools in diagnosis and treatment of illnesses has resulted in reduction of physicians’ errors and physical and financial damages. In this article we pr...
متن کاملAutomatic derivation of domain terms and concept location based on the analysis of the identifiers
Developers express the meaning of the domain ideas in specifically selected identifiers and comments that form the target implemented code. Software maintenance requires knowledge and understanding of the encoded ideas. This paper presents a way how to create automatically domain vocabulary. Knowledge of domain vocabulary supports the comprehension of a specific domain for later code maintenanc...
متن کاملAutomatic Labeling of the Object-oriented Source Code: The Lotus Approach
Most of open-source software systems become available on the internet today. Thus, we need automatic methods to label software code. Software code can be labeled with a set of keywords. These keywords in this paper referred as software labels. The goal of this paper is to provide a quick view of the software code vocabulary. This paper proposes an automatic approach to document the object-orien...
متن کاملComparative evaluation of open source software for mapping between metabolite identifiers in metabolic network reconstructions: application to Recon 2
BACKGROUND An important step in the reconstruction of a metabolic network is annotation of metabolites. Metabolites are generally annotated with various database or structure based identifiers. Metabolite annotations in metabolic reconstructions may be incorrect or incomplete and thus need to be updated prior to their use. Genome-scale metabolic reconstructions generally include hundreds of met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Intell. Data Anal.
دوره 19 شماره
صفحات -
تاریخ انتشار 2015